468 research outputs found
ODN: Opening the Deep Network for Open-set Action Recognition
In recent years, the performance of action recognition has been significantly
improved with the help of deep neural networks. Most of the existing action
recognition works hold the \textit{closed-set} assumption that all action
categories are known beforehand while deep networks can be well trained for
these categories. However, action recognition in the real world is essentially
an \textit{open-set} problem, namely, it is impossible to know all action
categories beforehand and consequently infeasible to prepare sufficient
training samples for those emerging categories. In this case, applying
closed-set recognition methods will definitely lead to unseen-category errors.
To address this challenge, we propose the Open Deep Network (ODN) for the
open-set action recognition task. Technologically, ODN detects new categories
by applying a multi-class triplet thresholding method, and then dynamically
reconstructs the classification layer and "opens" the deep network by adding
predictors for new categories continually. In order to transfer the learned
knowledge to the new category, two novel methods, Emphasis Initialization and
Allometry Training, are adopted to initialize and incrementally train the new
predictor so that only few samples are needed to fine-tune the model. Extensive
experiments show that ODN can effectively detect and recognize new categories
with little human intervention, thus applicable to the open-set action
recognition tasks in the real world. Moreover, ODN can even achieve comparable
performance to some closed-set methods.Comment: 6 pages, 3 figures, ICME 201
A Convolutional Long Short-Term Memory Neural Network Based Prediction Model
In recent years, the market demand for online car-hailing service has expanded dramatically. To satisfy the daily travel needs, it is important to predict the supply and demand of online car-hailing in an accurate manner, and make active scheduling based on the predicted gap between supply and demand. This paper puts forward a novel supply and demand prediction model for online carhailing, which combines the merits of convolutional neural network (CNN) and long short-term memory (LSTM). The proposed model was named convolutional LSTM (C-LSTM). Next, the original data on online car-hailing were processed, and the key features that affect the supply and demand prediction were extracted. After that, the C-LSTM was optimized by the AdaBound algorithm during the training process. Finally, the superiority of the C-LSTM in predicting online car-hailing supply and demand was proved through contrastive experiments
SODFormer: Streaming Object Detection with Transformer Using Events and Frames
DAVIS camera, streaming two complementary sensing modalities of asynchronous
events and frames, has gradually been used to address major object detection
challenges (e.g., fast motion blur and low-light). However, how to effectively
leverage rich temporal cues and fuse two heterogeneous visual streams remains a
challenging endeavor. To address this challenge, we propose a novel streaming
object detector with Transformer, namely SODFormer, which first integrates
events and frames to continuously detect objects in an asynchronous manner.
Technically, we first build a large-scale multimodal neuromorphic object
detection dataset (i.e., PKU-DAVIS-SOD) over 1080.1k manual labels. Then, we
design a spatiotemporal Transformer architecture to detect objects via an
end-to-end sequence prediction problem, where the novel temporal Transformer
module leverages rich temporal cues from two visual streams to improve the
detection performance. Finally, an asynchronous attention-based fusion module
is proposed to integrate two heterogeneous sensing modalities and take
complementary advantages from each end, which can be queried at any time to
locate objects and break through the limited output frequency from synchronized
frame-based fusion strategies. The results show that the proposed SODFormer
outperforms four state-of-the-art methods and our eight baselines by a
significant margin. We also show that our unifying framework works well even in
cases where the conventional frame-based camera fails, e.g., high-speed motion
and low-light conditions. Our dataset and code can be available at
https://github.com/dianzl/SODFormer.Comment: 18 pages, 15 figures, in IEEE Transactions on Pattern Analysis and
Machine Intelligenc
Parsing Objects at a Finer Granularity: A Survey
Fine-grained visual parsing, including fine-grained part segmentation and
fine-grained object recognition, has attracted considerable critical attention
due to its importance in many real-world applications, e.g., agriculture,
remote sensing, and space technologies. Predominant research efforts tackle
these fine-grained sub-tasks following different paradigms, while the inherent
relations between these tasks are neglected. Moreover, given most of the
research remains fragmented, we conduct an in-depth study of the advanced work
from a new perspective of learning the part relationship. In this perspective,
we first consolidate recent research and benchmark syntheses with new
taxonomies. Based on this consolidation, we revisit the universal challenges in
fine-grained part segmentation and recognition tasks and propose new solutions
by part relationship learning for these important challenges. Furthermore, we
conclude several promising lines of research in fine-grained visual parsing for
future research.Comment: Survey for fine-grained part segmentation and object recognition;
Accepted by Machine Intelligence Research (MIR
- …